Large-Scale Discovery of Gene-Enriched SNPs

نویسندگان

  • Michael A. Gore
  • Mark H. Wright
  • Elhan S. Ersoz
  • Pascal Bouffard
  • Edward S. Szekeres
  • Thomas P. Jarvie
  • Bonnie L. Hurwitz
  • Apurva Narechania
  • Timothy T. Harkins
  • George S. Grills
  • Doreen H. Ware
  • Edward S. Buckler
چکیده

Whole-genome association studies of complex traits in higher eukaryotes require a high density of single nucleotide polymorphism (SNP) markers at genome-wide coverage. To design high-throughput, multiplexed SNP genotyping assays, researchers must fi rst discover large numbers of SNPs by extensively resequencing multiple individuals or lines. For SNP discovery approaches using short read-lengths that nextgeneration DNA sequencing technologies offer, the highly repetitive and duplicated nature of large plant genomes presents additional challenges. Here, we describe a genomic library construction procedure that facilitates pyrosequencing of genic and low-copy regions in plant genomes, and a customized computational pipeline to analyze and assemble short reads (100–200 bp), identify allelic reference sequence comparisons, and call SNPs with a high degree of accuracy. With maize (Zea mays L.) as the test organism in a pilot experiment, the implementation of these methods resulted in the identifi cation of 126,683 putative SNPs between two maize inbred lines at an estimated false discovery rate (FDR) of 15.1%. We estimated rates of false SNP discovery using an internal control, and we validated these FDR rates with an external SNP dataset that was generated using locus-specifi c PCR amplifi cation and Sanger sequencing. These results show that this approach has wide applicability for effi ciently and accurately detecting geneenriched SNPs in large, complex plant genomes. THE AVERAGE NUCLEOTIDE diversity of coding regions between any two maize (Zea mays L.) lines (π = 1–1.4%) is twoto fi vefold higher than other domesticated grass crops (Buckler et al., 2001; Tenaillon et al., 2001; Wright et al., 2005). Moreover, it is not uncommon to fi nd maize haplotypes more than 2% diverged from one another (Tenaillon et al., 2001; Wright et al., 2005) and even as high as 5% (Henry and Damerval, 1997). Intragenic linkage disequilibrium (LD) rates rapidly decline to nominal levels within 2 kb in a population of diverse maize inbred lines (Remington et al., 2001). Of the ~2500 Mb that constitutes the maize genome, less than 25% is genic or low-copy-number sequence, with large blocks of highly repetitive DNA such as retrotransposons Published in The Plant Genome 2:121–133. Published 10 July 2009. doi: 10.3835/plantgenome2009.01.0002 © Crop Science Society of America 677 S. Segoe Rd., Madison, WI 53711 USA An open-access publication All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Permission for printing and for reprinting the material contained herein has been obtained by the publisher. M.A. Gore, Dep. of Plant Breeding and Genetics, Cornell Univ., 175 Biotechnology Bldg., Ithaca, NY 14853; M.H. Wright, Dep. of Genetics and Development, Cornell Univ., 102 Weill Hall, Ithaca, NY 14853; E.S. Ersoz, Institute for Genomic Diversity, Cornell Univ., 175 Biotechnology Bldg., Ithaca, NY 14853; P. Bouffard, E.S. Szekeres, and T.P. Jarvie, 454 Life Sciences, 20 Commercial St., Branford, CT 06405; B.L. Hurwitz and A. Narechania, Cold Spring Harbor Lab., 1 Bungtown Rd., Cold Spring Harbor, NY 11724; T.T. Harkins, Roche Applied Science Corp., 9115 Hague Rd., Indianapolis, IN 46250; G.S. Grills, Life Sciences Core Labs. Center, Cornell Univ., 139 Biotechnology Bldg., Ithaca, NY 14853; D.H. Ware, USDA-ARS, Cold Spring Harbor Lab., 1 Bungtown Rd., Cold Spring Harbor, NY 11724; E.S. Buckler, USDA-ARS, Dep. of Plant Breeding and Genetics, Institute for Genomic Diversity, Cornell Univ., 159 Biotechnology Bldg., Ithaca, NY 14853. All custom code and scripts used in this study are available upon request from M. H. Wright ([email protected]). M.A. Gore and M.H. Wright contributed equally to this work. Received 14 Jan. 2009. *Corresponding authors ([email protected]) and

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Determination of fucosyltransferase 3 gene polymorphisms frequency in Iranian blood donors

Abstract Background and Objectives The FUT3 gene regulates the expression of Lewis blood group antigens mainly Lea and Leb. The Lewis negative phenotype, is the result of an inactivated FUT3 enzyme that lacks glycosidase activity. Several single nucleotide polymorphisms (SNPs) may cause enzyme inactivation with different racial distribution. This study aimed to determine the frequency of these...

متن کامل

Identification of Prognostic Genes in Her2-enriched Breast Cancer by Gene Co-Expression Net-work Analysis

Introduction: HER2-enriched subtype of breast cancer has a worse prognosis than luminal subtypes. Recently, the discovery of targeted therapies in other groups of breast cancer has increased patient survival. The aim of this study was to identify genes that affect the overall survival of this group of patients based on a systems biology approach. Methods: Gene expression data and clinical infor...

متن کامل

Large scale production of Blackleg vaccine by fermenter and enriched culture medium in Iran

In all biological systems growth is defined as increase of chemical compounds. Bacteria can achieve to balanced growth if they are growing in a medium, which are completely adapted to it. Clostridium chauvoei, (Clostridium feseri) is an anaerobic, spore forming, motile, and polymorph bacteria, which its size varies from 0.5-1 to 3-8 micron and could be observed as individual bacterium, diplo, a...

متن کامل

Using eQTL weights to improve power for genome-wide association studies: a genetic study of childhood asthma

Increasing evidence suggests that single nucleotide polymorphisms (SNPs) associated with complex traits are more likely to be expression quantitative trait loci (eQTLs). Incorporating eQTL information hence has potential to increase power of genome-wide association studies (GWAS). In this paper, we propose using eQTL weights as prior information in SNP based association tests to improve test po...

متن کامل

Investigation of fimH Single Nucleotide Polymorphisms (C640T and T591A) in Uropathogenic E. coli Isolated from Patients with Urinary Tract Infections

Background: Urinary tract infections are one of the most frequent health problems and Uropathogenic Escherichia coli is the major pathogen resulting UTIs. The severity of UTIs is caused by the expression of a large range of virulence factors.In this study, we evaluated the allelic frequency fimH gene, in UPECs isolated from patients with UTIs. This study also aimed to determine the roles of C64...

متن کامل

Novel Single Nucleotide Polymorphisms (SNPs) in Intron 2 and Exon 3 Regions of Leptin Gene in Sumba Ongole Cattle

The bovine leptin (LEP) gene was widely used as a candidate gene for molecular selection to improve productivity traits of cattle. This study was carried out to identify single nucleotide polymorphisms (SNPs) in the LEP gene of Sumba Ongole (SO, Bos indicus) cows using sequencing method. A total of 31 animals were used in this study for analyses. Research showed that total of 16 SNPs w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009